18:11
2026-06-16
lesswrong.com
artificial-intelligence
1 Layer Induction Heads and Some Research
Researchers challenge the established belief that induction heads require two layers in transformer architectures, arguing that the phenomenon may be attributable to two attention heads rather than twโฆ